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ABSTRACT 


Finite state space semi-Markov processes find application in many areas. Often in- 
terest centers on whether or not the process has hit a particular state before a time t. 
This thesis reports results of a simulation study of the small sample behavior for three 
estimators of the survival probability of a first passage time for a semi-Markov process 


using censored data. 
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I INTRODUCTION 


Finite state space semi-Markov models find applications in a variety of fields such 
as queueing theory, reliability, and clinical trials. Often interest in the application of 
these models centers on the distribution of the first passage time to a State or a set of 
states representing, for example, the lifetime of a system or the end of a busy period of 
a server. Suppose that the observations of the path of the semi-Markov process are all 
that is known about the process. The problem is to estimate the probability that the first 
passage time has not occurred before time t. 

Censored data problems arise frequently in medical, and also in engineering system 
reliabilitv applications. For example, in medical survivorship studies some subjects may 
be lost to follow-up, or available data may be analyzed before all subjects have expired. 
In the equipment reliability context, observed units may still be in operation, perhaps 
after several previous failures, at the time of the analvsis. 

Three possible estimators will be considered. The three estimators use different 
amounts of information concerning the process. One estimator uses only the observed 
first passage times. Another estimator makes parametric assumptions concerning the 
sojourn time distribution and uses maximum likelihood. A third approach uses an ex- 
ponential approximation to the probability and empirical distributions to estimate the 
sojourn time distributions. 

The three estimators were investigated for a specific semi- Markov process with 
uncensored data in Kim[Ref. 1] and Jacobs[Ref. 2]. Results of a simulation study of the 
three estimators using censored data are reported in Gallagher [Ref. 3]. The emphasis in 
this latter study is on the behavior of the point estimates with mean biases and standard 
errors being given. 

In this thesis the investigation of the behavior of the three estimators with censored 
data is continued. The emphasis here is on confidence intervals for the three estimators. 
In Chapter 2, the three estimators are described and the respective confidence interval 
procedures considered introduced. Chapter 3 contains the details of the simulation ex- 


periment and its results. Finally, conclusions from the study are given in Chapter 4. 


Il. NATURE OF PROBLEM 
A. PROBLEM 


The semi-Markov process model considered is as follows. Suppose we observe N 
individuals. Let X(i) be the state of the individual at time t. We will assume 
(A(D);1>0) i = 1,2,3,....,N. are independent identically distributed semi- Markov proc- 
esses having the same probability law as (X;t>0). The process (X,; t 20} is a semi- 
Markov process with three states (0, 1, 2). The individuals start at t = 0 in state I. 
Upon leaving state 1, the process transitions to state O with probability 0 and to state 2 
with probability 1 — 0. From state 2 the process transitions to state 1 with probability 
1. State 0 is an absorbing state. The first passage time to state O will be referred to as 
the time of death. The N individuals are censored independently. The censoring times 
are exponentially distributed with a mean of E The entire path of transitions and 
soJourn tlmes are observed until the time of censoring or death. 

Let 


Р = inf(tz0; X,20) 
and 
SQ) 2 P(D» ri) 
where D is the time of death (or entrance to state 0). The problem is to estimate the 


survival probability P(D > r) with the censored data of N individuals. 


B. ESTIMATORS 
1. Kaplan-Meier Estimator 
A non-parametric estimator of the distribution function for censored data is the 
Kaplan-Meier estimator(K.M.E.) which is often called the product limit estimator [Ref. 
4]. Let U,, U,, ...,U, be independent identically distributed random variables with distrib- 
ution G having a density function. [её И, Г,, ..., Г, be independent identically distrib- 


uted times to censoring with a continuous distribution function. Let 


and 


; 0 ў U, < V; 
de 1 Otherwise. 


The Z, are the observed times and 9, is an indicator of whether or not the /^ observation 
meeensored, Let Z,,= Z,,=... = Z,, be the order statistics of (Z,) and å,, be the corre- 
sponding values of {6,}. It is assumed that there will be no ties since the underlying 
distribution functions are continuous. The Kaplan-Meier estimate of the survival func- 


tion S(t) = (1 — G(9) 1s 


(n — i) 1-6, | 
ES СЁ газа 


S() = | 20 51 (2.1) 
Undefined if I> Zo & доу = 1. 


The variance of S(t) is given approximately by 


NO eis ó; 
Var[S()] = ESQ) > rene] (2.2) 


KE] 
[Ref. 4: p. 464]. The Kaplan-Meier estimator using the death times of the N individuals 
will be denoted by P,{D > 1). 

If the Kaplan-Meier estimate is undefined, we investigate the effect of two 
methods, defined as MOD 1 and MOD 2, to make the Kaplan-Meier estimate honest. 
MOD 1 and MOD 2 are defined as follow. In MOD 1, the remaining mass of the esti- 
mated survival function is assigned to the last datum Z,,(which is censored). In MOD 
2, if the last A data points are censored the remaining mass of the survival function is 
distributed equally among the & data points. For example, if the estimated survival 
probability at the last uncensored point is 0.2 and there are additional two data points 

0.2 


which are censored, then each of these additional points is assigned a mass of Por 


2. Maximum Likelihood Estimator 
In this subsection, the maximum likelihood estimator(M.L.E.) will be given for 
the special case in which the sojourn time in state / is exponentially distributed with 
ken 
mean 75- (i= 1, 2). 
Let К, be the number of transitions from state i to state j for one individual. 


The log likelihood function for an individual is 
[= В, > (1 — 9) + Кошд+ Кл шр) + (Ко + К) ар, — p111 — 0217 (2.3) 


where T, (i — 1,2) 1s the total time spent in state / before death or censoring. 


The maximum likelihood estimators using the data from all N individuals are 


^ R 
Е: (2.4) 
Ко + Ri 
ОО 
A 1, | (29) 
Г, 
С. 
po, (2.6) 
I, 
"m 
n å death ne ne censored | (2.7) 
T, + T, 
where 
M 
Ry = > Куп) (2.8) 
n=l 
and 
N 
Ma У тұп) (2.9) 
n=1 


with R,(n) being the number of transitions from i to j for the n^ individual and T(n) being 
the total time spent in state 1 before death or censoring for the ¢ individual. 


To obtain asymptotic variances for these estimators, note that 
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.10) 


(2.11) 


(2.12) 


(2.13) 


(2.14) 


(2.15) 


where c-! is the mean of the exponential censoring time. Thus for N individuals the 


asymptotic variances of the estimators are 
Var[0] = (0) 
Ка Гр.) = Қол)"; (2.16) 


Var[p3] = Xp)" 


where 


ДО. — + —— 1; 2210 
0 ot (2.17) 
Кр) = N g 2 5. (2.18) 
på 
R 
(p; 2 N EL] (2.19) 
på 


The expression for the survival function S(1=P(D>t) for this continuous time 
Markov chain is 


15 + 11 + 
лс 02 E P2 ей Ира L (2.20) 
Å» 1 447 42 
where 4, 2, are the roots of the equation 
др.ір» Б Ур; Ф р») + y` = 0. (2.21) 


The maximum likelihood estimator, denoted as PADS 1), for the survival 
probability is [Ref. 5: p. 5 eqn 1.17] 


A 05 7 + р 3 2 + р 3 
P (D> ) =—— | зам IT dé (2.22) 
E 


\ A 


where À, and 2, аге roots of the equation 


0010, + у(0, + 0) + у = 0. (2.23) 


Since the maximum likelihood estimators are orthogonal, the asymptotic variance of 


PAD > t) [Ref. 5: p. 5 eqn 1.19] is approximately 


Lol Pl D > t)|0, Фу, p>] 
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= Var( EE + Vad 2 — 
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3. Asymptotic Renewal Estimator 


ED as 0. 





(2.24) 


(2.25) 


(2.26) 


(2.27) 


(2.28) 


(2.29) 


(2.30) 


(2.31) 


(2.32) 


In this subsection, we describe an asymptotic renewal estimator(A.R.E.) for 


P(D>1). A conditioning argument yields the following equation for P{D > t) ; 


P(D> t) =g(D) + (1 — | (FF) (ds) PD = S (2:38) 
0 


with 


2 [ = 
g(r) 2 Е (1) + (1 — ol F, (ds) F,(t — 5). (2.34) 
0 
Thus, P(D > t) satisfies a renewal-type equation with defective inter-renewal distribution 
L(t) = (1 — 0) (F,*F,) (0 (2.35) 


where F, is the sojourn time distribution in state i, F(z) = 1 — EI, and (F,*F,)(t) denotes 


the convolution of F, and F,. Following Feller [Ref. 6], let k be such that 


| e" Ldi) « (1 — 0) 60 40 = 1 (2.36) 
0 
Where 
GE е" қа), (2.37) 
0 


Then, under certain integrability conditions, if (F,*F,) is not arithmetic 


lim e" P(D» r) = ー (2.38) 
where 
и = | se L(ds) (2.39) 
0 
and 


ою 


= e* * g(s)ds (2.40) 


| е ЖЕР (s) + (1 — o] F, (du) F,(s — u)]ds 


=-- ПФ0-1)Ғ(1-609(0(9Х0- ІЗ. 
Since 
(1— 0)ф;(К) ФхК) = 1 


it follows that 


1 


р ада 0-3 


do б 


> |= 


[6,09 - 1+-- C1- (1 = 0) 6,00] 


> |-- 


Wun 11 + [1 — ó (A) € 8 6,0)] 


É did) (2.41) 


Let E be the Kaplan-Meier estimate of F,, and 9 be the maximum likelihood 
estimate of 0 [Ref. 5: p. 10], put 


Ф = | е F (ds). (2.42) 
0 


The asymptotic renewal estimator (P,(D » r}) [Ref. 5: p. 11 eqn 3.11] of the sia 
probability P{D > t) is 


PD ee (2.43) 
H 
where & is the solution to the equation 

(1 — 60)ф1()Ф: (К) = 1; (2.44) 

A = 1-0] e5 s(F,“E,) (d3), (2.45) 
0 

A 9 А A 

DE 79%. (2.46) 


In the simulation k is obtained by using the golden section search method. 
The asymptotic renewal estimator 1s undefined if all the sojourn times for a 


particular state are censored since the Kaplan-Meier estimator is not defined in this case. 


C. CONFIDENCE INTERVAL PROCEDURES 

A confidence interval for an unknown parameter gives both an indication of the 
numerical value of the unknown parameter and a measure of how confident we are of 
that numerical value. Two statistics L and U form a (1 — «)100% confidence interval for 
6, if under repeated random sampling L<@<U (1 —4a)100% of the time. Å confi- 
dence interval procedure for an unknown parameter 0 is also used to make a decision 
concerning 6, as in classical hypothesis testing or decision making, or to indicate the 
accuracy and variability of a point estimator 0. 

Confidence interval procedures for the three estimators for P(D > t) will be de- 
scribed in this section, starting with the confidence Interval procedure for the Kaplan- 
Meier estimator(K.M.E.). Procedures for the maximum likelihood estimator(M.L.E.) 
and the asymptotic renewal estimator will then be discussed. 

A preliminary transformation to approximately symmetrize the sampling distrib- 
ution of the estimator is often beneficial [Ref. 7]. For this study we consider two tran- 


sformations, the arc-sine and log transformations, which tend to stabilize and also 


approximatelv symmetrize the data. These transformations were suggested by the work 
of Gaver and Miller (Ref. 8]. Since the individuals are independent, the number of in- 
dividuals surviving a fixed time t would have a binomial distribution if there were no 
censoring. The logarithmic and arc-sine transformations have been beneficial in this 
case. 

The confidence intervals for the Kaplan-Meier estimator and the maximum likeli- 
hood estimator are the asymptotic normal confidence intervals using the transformed 
estimator. 

The confidence interval for the arc-sine transformed estimator is computed as fol- 
lows. Let S(t) be the estimator of S(r) 2 P(D » t) . Since 


E m о J= | (2.47) 


dx КЕГЕН 


a Taylor expansion vields 


sin / $0 ) = sin (SM) + +—————T ü - 50). (2.48) 


2 AJ Si) Sts) 


Hence, the approximate variance of sin-!(、/ YO ) 15 


l 


mae (2.49) 
(1- 5(4))5(г) 


Рат зи / Š() )] = + 


г S) = ],we set Var[ sin? YO )] equal to 0. Therefore, a (1 — «)100% confidence 


interval for sin-!、/S(/) is 
SG SW) x Ed Кан sin" (4 / 50) )] (2.50) 


where z,_,,. is the (1 — «/2)100% point of a standard normal. 
A confidence interval for the log transformed estimator of S(t) is computed as fol- 
lows. Since 


пхаў, (2.51) 


a Taylor expansion yields 


E DS SQUE 3 0 ESO (2.52) 
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Thus the approximate variance of In S(r) is 





Var(In S(r)] ~ Var[S(2)]. (2.53) 


Stef 
If S(;) = 1, we set Var[S()] 2 0. A (1 —a)100% confidence interval for ln S(1) is 


In S(t) + (2-42 Varlin S()] (2.54) 


where z,,5; 1S the (1 — «/2)100% point of a standard normal. 
І. Asymptotic Normal Confidence Intervals for K.M.E. 

The asymptotic variance of the Kaplan-Meier estimator (РАР > t}) is given 
by equation (2.2). Тһе asymptotic normal confidence intervals for 
sin /PÅD>n and In PAD > 1) are evaluated using equations (2.50) and (2.54) re- 
spectivelv. The corresponding confidence intervals for P,(D > 1) are formed by inverse 
transformation. If the lower limit is less than 0 it is set equal to 0 and if the upper limit 
Is greater than I it is set equal to 1. 

2. Asymptotic Normal Confidence Intervals for M.L.E. 

The asymptotic variance for the maximum likelihood estimator (PAD > t)) 1S 
given by equation: (222). The asymptotic normal confidence intervals for 
sin (/ PADS t}) and In P,{D> 2} can be constructed using equations (2.50) and 
(2.54) respectively. Confidence intervals for P,{D> t) can be obtained by inverse 
transformation. If the resulting confidence interval has a lower limit less than 0 it is set 
equal to 0 and if it has an upper limit greater than 1 it 1s set equal to 1. 

3. Jackknife Procedure for A.R.E. 

The jackknife technique was first introduced by Quenouille [Ref. 9] and later 
utilized by Tukev [Ref. 10] for bias reduction and robust interval estimation. A review 
can be found in Miller [Ref. 11]. The jackknife is designed to do various jobs fairly well, 
however it 1s desirable to avoid (in jackknifing) sampling distributions with (1) abrupt 
ends and (ii) one or more straggling tails, and it is probably desirable to avoid those that 
are strongly unsymmetrical [Ref. 12]. Confidence intervals for the asymptotic renewal 
estimator will be obtained using the jackknife procedure on the arc-sine and log trans- 
formed estimates. 

The jackknife procedure for the confidence interval of the asymptotic renewal 


estimator (РА » tj) 1s implemented as follows. 
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1 


Ñ+. 


. Generate data for N individuals. 


. Compute P {の > t) using all data. 


. Iransform PD > t) into ln P {の — апа о“ ~ P {の > 1) which will be denoted 


as YL,,, and YS,, respectively. Divide the N individuals into n subgroups such 


. 7 ж . . 
that each subgroup contains ты individuals. 


. Compute P,{D > г) leaving out all data of the i“ subgroup and transform it into 


ln P {の > г} апа sini PD > г) which will be denoted as yl, and ys, respectively. 


Compute the pseudo-values(denoted as YL., and YS., ); 


YL., = n YL. = (n— I) yl, 


YSs; = nYSy — (n— 1) ys; 


(2.55) 


(2.56) 


. Compute the average of pseudo-values which is the jackknifed estimate for the 


transformed asymptotic renewal estimator(denoted as YL., and YS. ); 


i 
ҮІ. = ~ [Уб + 


| 


E AQ 


SS 


(2.57) 


(2.58) 


and compute the variance of the average of the pseudo-values(denoted as 


Ir and VS? ); 


が 
1 el _ 2 
VS o SE си К) 
i=] 


(2.59) 


n 
ys: = NN ПЕ (2.60) 
n(n — 1) | | | 
{= | 


7. Compute the (approximate) two-sided (1 — «)100% confidence intervals for each 


of the transformed estimators as follows; 


ҮІ. + гаан = Dy VLi (2.61) 


апа 


а (2.62) 


where r,_, (n — 1) is the (1 — «/2)100% point of student's t with n-] degree of free: 


YS. 


ІНЕ 


dom. 


8. Confidence intervals for P(D » t) are obtained by inverse transformation. If the 
resulting interval has a lower limit less than 0 it is set equal to 0 and if it has an 


upper limit greater than I it 1s set equal to I. 


Ill. ANALYSIS OF SIMULATION RESULTS FOR CONFIDENCE 
INTERVAL PROCEDURES 


A. SIMULATION 

All simulations were carried out on an IBM 3033AP computer at the Naval Post- 
graduate School using the LLRANDOM II random number generator package[Ref. 
13]. Plots of simulated estimates and confidence intervals were produced by an exper- 
imental APL package GRAFSTAT which the Naval Postgraduate School is using under 
a test agreement with IBM Watson Research Center, Yorktown, Heights, NY. 


The data for the simulation experiments are generated as follows; 
1. An individual starts in state 1 at time 0. 
2. Ап exponential censoring time with mean 1/с is generated. 
3. An exponential sojourn time in state 1 with mean I/ p, is generated. 


4. The sojourn time and censoring time are compared; if the sojourn time is smaller, 
then the sojourn time is recorded and given an uncensored index ‘0’; if the censor- 
ing time 15 smaller, the sojourn time truncated at the censoring time is recorded and 
given a censored index ‘1’. In the latter case, the death time is recorded as the 
truncated sojourn time and associated with a censored index of ‘1’ and the simu- 


lation for the first individual is completed. 


5. If the process is not censored in state 1, a uniform random number is generated 
and compared with 0, if less than @ the process jumps to state 0, and the uncen- 
sored death time with index ‘0’ is recorded and the simulation for the first individual 


is completed; if greater than 0, the process jumps to state 2. 


6. An exponential sojourn time for state 2 with mean 1/ p, IS generated and the total 
time(sojourn time in state I plus sojourn time in state 2) is compared to the cen- 


soring time with the same actions as listed above. 


7. If the process is not censored by the end of the sojourn time in state 2, the process 
jumps to state | and continues until an uncensored or censored death occurs. The 


time is recorded and the next individual is started. 


This procedure continues until N individuals” data have been generated. Using this 
data, the Kaplan-Meier estimate (Р,(1)) , maximum likelihood estimate (PD), 
asymptotic renewal estimate (PD), and their respective 90% and 80% two-sided confi- 
dence intervals are computed. This completes the one super-replication.The simulation 


is replicated for SR = 500 super-replications utilizing different seeds to generate the 





data. For the smulated model described above parameter values of 
= M maba RUM - 
p = 1, p,= 1, 0 = 0.5, and c = >, 5, 10, апа Too are used, and for each super 


replication N = 50 individuals’ data are generated. 


B. ANALYSIS 

In this section results from the simulation experiments will be reported. In Ap- 
pendix A, the true survival probability, which is obtained using equation (2.20), 1s given 
at the various values of t. The times considered are t = 1.0, 3.0, 5.0, 7.0, 9.0, 11.0, 13.0, 
and 15.0. 

Tables 2 through 5 of Appendix B report the number of super-replications that still 
have defined Kaplan-Meier estimates at various times t. The column headed Бу КМО 
reports the number of super-replications that have defined Kaplan-Meier estimates at 
time t for the death time survival function. The columns headed KMI. KM2, repectively 
show the number of super-replications still defined for the Kaplan-Meier estimate of the 
Survival function at time t for the sojourn times in state 1, respectively state 2. These 
numbers indicate the effect the censoring has on the Kaplan-Meier estimate. As ex- 
pected, increasing the mean time to censoring increases the number of super-replications 
censored. In all cases the survival functions for the sojourn times in state I and 2 are less 
heavily censored than that of the death times. 

Plots for comparing the methods investigated for the Kaplan-Meier estimator are 
given in Figures 1 through 3. The difference between the mean of the estimated survival 


probability and the true survival probability KO — P(D » tj), the relative differences 
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(504) —-PID>nyPID>1), and the relative root mean square errors 
(RMSE/P{D > t)) are respectively plotted in Figures I through 3. The numerical results 
are recorded in Tables 6 through 17. The means, and root mean square errors(RMSE) 


from SR = 500 super-replications are computed as follows; 


SR 
S(t) で と S1) (3.1) 
i=] 
and 
SR 
RMSE = [で と У ве бр? (3.2) 


[=] 


Where YO is the point estimate of the true value S(r) at time t in the 7” super-replication 
and SR is the number of super-replication. 

If a Kaplan-Meier estimate is undefined, its value is taken to be its last defined 
value. The modifications to the Kaplan-Meier estimate, MOD 1 and MOD 2, are de- 
scribed in Chapter 2. Since the unmodified Kaplan-Meier survival function may not 
tend to 0 as too, it tends to overestimate the true survival probability. On the other 
hand, since the modifications MOD 1 and MOD 2 make the undefined Kaplan-Meier 
survival function honest, they do well for large t and have less variability than the un- 
modified K.M.E.. However the modifications appear to bias the estimates for moderate 
times t. All these methods improve as the mean censoring time decreases. MOD 1 does 
slightly better than MOD 2. 

In Figures 4 through 9, plots are presented for the comparison between the three 
estimators. The differences, relative differences, and relative RMSE’s are computed and 
plotted in the same manner as before. Figures 4 through 6 show results for the 
Kaplan-Meier and asymptotic renewal estimators using the unmodified Kaplan-Meier 
estimator. Figures 7 through 9 show the results obtained when the Kaplan-Meier esti- 
mates are modified to be MOD I in both the K.M.E. and A.R.E.. The numerical results 
are recorded in Appendix C. As expected, the maximum likelihood estimator(M.L.E.), 


which uses the most correct information about the model, tends to have the smallest 


relative RMSE s and differences between estimated and true means. In the case of 
greatest censoring, the M.L.E. shows a slight bias. 

The Kaplan-Meier estimator(K.M.E.), which uses the least information about the 
model, does well for small times even in the greatest censoring case, but it is worse than 
the others for large times due to the undefined estimates. It has the largest values of 
mean square error among the three estimators. Therefore the K.M.E. tends to be more 
variable than the other estimators. 

Not surprisingly, the asymptotic renewal estimator(A.R.E.), which uses an 
asymptotic exponential approximation to the probability, has high relative RMSE’s and 
differences between estimated and true means for small times t. As t increases the rela- 
tive RMSE's and differences between the means decrease. Tables 6 through 9 indicate 
that for moderate to large times the A. R.E. has RMSE's that are comparable or less 
than those of the M.L.E.. The time at which they become comparable 1s a function of 
the amount of censoring. If the mean censoring time 1s larger, the RMSE's become 
comparable sooner. Tables 6 through 12 indicate that while modifying the Kaplan-Meier 
estimate of the survival function of the death times improves it for large times t, it cre- 
ates a bias for moderate times t. Using modified Kaplan-Meier estimators of the survival 
functions for the sojourn times in states 1 and 2 does not improve the asymptotic re- 
newal estimator. 

In Figures 10 through 17, plots are presented for comparing the confidence interval 
procedures of the three estimators. In order to compare the performance of the confi- 
dence interval procedures, we use the following measures: the coverage fraction(CVR), 
the average half length(AHL) of the confidence interval, and the standard 
deviation(SHL) of the half lengths. The AHL is determined by summing the half lengths 
of the confidence intervals of the replications and dividing that sum by the number of 
replications(SR). The SHL is computed by summing the square of the differences be- 
tween the individual half lengths and the average half length, dividing that value by 
(SR-1) and taking the square root of the resulting value. Among confidence interval 
procedures which achieve the desired coverage rate(0.90, and 0.80), the confidence in- 
terval procedure which yields the smallest AHL is preferred. Also preferred is a small 
SHL representing a stable confidence interval procedure. 

For each procedure, the number of intervals covering the true value P(D » t) is 
recorded as well as the number of intervals that are too high or too low. These results 
are reported in Tables 18, 20, 22, 24, 26, 28, 30, and 32 (Appendix D). Next to each 


coverage count is given the corresponding coverage fraction in parentheses. If a 


(1 — «)100% confidence interval procedure is performing well, then this interval should 
cover about (1 — «)100% of time. A 95% confidence interval for the coverage fraction 
is computed using P + 1.96[P(1 — P)/SR]* ,where P is the proportion of (1 — «)% con- 
fidence intervals that cover the true value of P{D >) and SR 1s the number of super- 
replications. Therefore, if a 80%(90%) confidence interval procedure is working well, 
then out of 500 super-replications between 382(436) and 418(463) confidence intervals 
should cover the true value. These intervals correspond to 0.7649(0.8351) and 
0.8737(0.9263) respectively. The average half lengths of the confidence intervals for 
S(t) are reported in tables 19, 21, 23, 25, 27, 29, 31, and 33 (Appendix D). The standard 
deviation of the half length is given in parentheses below the average half length. If an 
estimator is performing well, its confidence interval should not only have the correct 
coverage fraction but also a small average half length. 

In Figures 10 through 13, the coverage fraction and the average half length of 
confidence intervals using the log transformation for each estimator are presented. In 
Figures 14 through 17, the coverage fraction and the average half length of confidence 
intervals using the arc-sine transformation are presented. The two horizontal] lines in the 
coverage plot show the 95% confidence interval for the coverage fraction. 

The following remarks concern the confidence intervals obtained using the arc-sine 
transformation. The asymptotic normal confidence intervals for the M.L.E. have the 
correct coverage for small to moderate times but tend to slightly undercover for large 
times. The asymptotic nomal confidence intervals for the K. M.E. have the correct cov- 
erage for small times but undercover for moderate to large times due to the undefined 
estimates. As the mean censoring time increases, the coverage of the Kaplan-Meier 
confidence intervals improves. The confidence intervals for the asvmptotic renewal esti- 
mate tend to undercover. Using a modified Kaplan-Meier estimate makes very little dif- 
ference in the results. 

The following remarks concern the confidence intervals obtained by using the log 
transformation. Once again the confidence intervals for the K.M.E. undercover for 
moderate and large times t. The coverage is slightlv worse than that obtained using the 
arc-sine transformation. The confidence intervals for the M.L.E. have the correct cov- 
erage and the smallest AHL. The confidence intervals for the Jackknifed A.R.E. have 
the correct coverage for all but the 80% interval at t=1 for c=1/2. The average half 
lengths of the A.R.E. confidence intervals are larger than those of the M.L.E. confidence 
intervals. Using a modified Kaplan-Meier estimate makes very little difference in the re- 


sults. 
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IV. CONCLUSIONS 


This thesis considers the problem of estimating the survival probability P(D > t) for 


the first passage time to state 0 for a semi-Markov process using censored data. Simu- 


lation is used to study the small sample behavior of three estimators and their confidence 


Interval procedures. 


One of the estimators studied is the Kaplan-Meier estimator of the first passage 


times to state 0. Both the unmodified Kaplan-Meier estimator and two modifications, 


MOD 1 and MOD 2 making the estimated distribution honest are considered. Another 


estimator is the maximum likelihood estimator. A third estimator, the asymptotic re- 


newal estimator, uses an exponential approximation to the survival function. 


l. 


The following conclusions are drawn from the simulation experiment. 


The modified Kaplan-Meier estimators MOD 1 and MOD 2 using the first passage 
times to state 0 have a smaller bias for large times than the unmodified K.M.E.. 
However, in the medium range of times, the two modified procedures MOD 1 and 
MOD 2 have a larger bias than that of the unmodified K.M.E.. MOD 1 1s slightly 


better than MOD 2. 


Modifying the Kaplan-Meier estimates of the sojourn time distributions in the 


asymptotic renewal estimate does not improve its performance. 


The asymptotic normal confidence intervals for the Kaplan-Meier estimator of the 
first passage times to state 0 using the arc-sine transformation have a slightly better 


coverage than those using the log transformation. 


The confidence intervals using the log transformed estimators are preferred to 
those using the arc-sine transformed estimators for the maximum likelihood esti- 


mator and the asymptotic renewal estimator. 


E 


3 


The confidence intervals for the jackknifed log transformed asymptotic renewal 
estimator have the correct coverage for all but the smallest times. This estimator 
makes no assumptions concerning the parametric form of the sojourn time distrib- 
utions. It's expected that it will also perform well in cases in which the sojourn time 


distributions are not exponential. 


The confidence intervals for the maximum likelihood estimator have the correct 
coverage, and also have the smallest average half length. However the estimators 
depend on the parametric form of the sojourn time distributions. If the parametric 


form is incorrectly specified, the M.L.E. can be quite biased, Kim[Ref. 1]. 
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APPENDIX A. TRUE SURVIVAL PROBABILITY 


Table 1. TRUE SURVIVAL PROBABILITY FOR MODEL 
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APPENDIX B. NUMBER OF K.M.E. DEFINED AT TIME T 


Table 2. NUMBER OF KAPLAN-MEIER ESTIMATES DEFINED AT TIME T: 
C- 1/2 





Table 3. NUMBER OF KAPLAN-MEIER ESTIMATES DEFINED AT TIME T: 
С= 1/5 
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Table 4. NUMBER OF KAPLAN-MEIER ESTIMATES DEFINED AT TIME T: 
C- 1/10 





Table 5. NUMBER OF KAPLAN-MEIER ESTIMATES DEFINED AT TIME T: 
C= 1/100 
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APPENDIX C. STATISTICS OF THREE ESTIMATORS 


Table 6. STATISTICS OF THREE ESTIMATORS : UNMOD, C= 1/2 
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Table 7. STATISTICS OF THREE ESTIMATORS : UNMOD, C= 1/5 


0.6688 0.0043 
0.3540 0.0056 
0.1982 0.0040 
0-1155 0.0022 
0.0708 0.0011 
0.0507 0.0005 
0.0435 0.0002 


0.0399 0.0001 
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30 
5.0 
720 
9.0 
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3.0 
30 
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Table 8. STATISTICS OF THREE ESTIMATORS : UNMOD, C= 1/10 
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Table 9. STATISTICS OF THREE ESTIMATORS : UNMOD, C= 1/100 


E 
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Table 10. STATISTICS OF THREE ESTIMATORS : MOD 1, С= 1/2 
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Table 11. STATISTICS OF THREE ESTIMATORS : MOD 1, C= 1/5 


0.6688 0.6599 
0.3540 0.3547 
0.1973 0.2004 
0.1046 0.1148 
0.0465 0.0665 
0.0174 0.0358 
0.0055 0.0229 
0.0011 0.0156 
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Table 12. STATISTICS OF THREE ESTIMATORS : MOD 1, C= 1/10 
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Table 13. STATISTICS OF THREE ESTIMATORS : MOD 1, C=1/100 


p 





1.0 
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Table 14. STATISTICS OF THREE ESTIMATORS : MOD 2, C= 1/2 
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Table 15. STATISTICS OF THREE ESTIMATORS : MOD 2, C= 1/5 


| 
Wee 


0.0048 
0.0068 
7 0.0065 
0.0061 
0.00-16 
0.0022 
0.0009 
0.0002 
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Table 16. STATISTICS OF THREE ESTIMATORS : MOD 2, C= 1/10 


E 





Table 17. STATISTICS OF THREE ESTIMATORS : MOD 2, C- 1/100 


E 


0.6419 
(527 
0.1950 
0.1088 
0.0612 
0.0347 
0.0198 
0.0114 
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APPENDIX D. CONFIDENCE INTERVALS 


Table 18. TWO-SIDED 90 % COVERAGE FRACTION(UNMOD, C= 1/2) 
3867 
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Table 19. AVERAGE AND STANDARD DEVIATION OF HALF LENGTH(90% 
C.I, UNMOD, C= 1/2) 


TIME 
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Table 20. TWO-SIDED 80 % COVERAGE FRACTION(UNMOD, C= 1/2) 
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Table 21. AVERAGE AND STANDARD DEVIATION OF HALF LENGTH(80% 
C.I, UNMOD, C= 1/2) 


TIME 
| 
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Table 22. TWO-SIDED 90 % COVERAGE FRACTION(UNMOD, C= 1/10) 


cov | Rg | me | AREA 
| sm [ me Lem re sm 
DEE 
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Table 23. AVERAGE AND STANDARD DEVIATION OF HALF LENGTH(90% 
C.I, UNMOD, C= 1/10) 


TIME 
11217 11105 0932 .0903 
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Table 24. TWO-SIDED 80 % COVERAGE FRACTION(UNMOD, C= 1/10) 


„[ км | м а 
ERAGE | roo | sm ros | ASIS | 106 | SIN 
DEE 
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Table 25. AVERAGE AND STANDARD DEVIATION OF HALF LENGTH(80% 
C.I, UNMOD, C- 1/10) 


TIME 
0872 .0864 0701 .0683 
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Table 26. TWO-SIDED 90 % COVERAGE FRACTION(MOD 1, C= 1/2) 
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Table 27. AVERAGE AND STANDARD DEVIATION OF HALF LENGTH(90% 
C.I, MOD 1, C= 1/2) 





TIME 
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Table 28. TWO-SIDED 80 % COVERAGE FRACTION(MOD I, C= 1/2) 
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Table 29. AVERAGE AND STANDARD DEVIATION OF HALF LENGTH(80% 
C.I, MOD 1, C= 1/2) 


TIME 
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Table 30. TWO-SIDED 90 % COVERAGE FRACTION(MOD 1, С= 1/19) 
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Table 31. AVERAGE AND STANDARD DEVIATION OF HALF LENGTH(90% 
С.І, MOD 1, C= 1/10) 


TIME 
1121. 4105 0718 .0714 .1276 .1187 
(.0064) (.0062) (.0061) (.0060) (0812) (.0704) 
1223 11185 0972 .0953 431  .1222 
3.0 
(.0062) (.0066) (.0042) (.0044) (.0762) (.0457) 


.1123  .1046 О .0837 21/5007 1021 













(.0132) (.0142) (.0101) (.0103) (.0765) (.0379) 















.0988 .0547 .1077 .0761 
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Table 32. TWO-SIDED 80 % COVERAGE FRACTION(MOD 1, C= 1/10) 
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Table 33. AVERAGE AND STANDARD DEVIATION OF HALF LENGTH(80% 
С.І, МОР 1, C= 1/10) 
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